Guessing morphological classes of unknown German nouns

نویسندگان

  • Preslav Nakov
  • Yury Bonev
  • Galia Angelova
  • Evelyn Gius
  • Walther von Hahn
چکیده

A system for recognition and morphological classification of unknown German words is described. Given raw texts it outputs a list of the unknown nouns together with hypotheses about their possible stems and morphological class(es). The system exploits both global and local information as well as morphological properties and external linguistic knowledge sources. It learns and applies ending-guessing rules similar to the ones originally proposed for POS guessing. The paper presents the system design and implementation and discusses its performance by extensive evaluation. Similar ideas for ending-guessing rules have been applied to Bulgarian as well but the performance is worse due to the difficulties of noun recognition as well as to the highly inflexional morphology with numerous ambiguous endings.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MorphoClass - Recognition and Morphological Classification of Unknown Words for German

A system for recognition and morphological classification of unknown words for German is described and evaluated. It takes raw text as input and outputs a list of the unknown nouns together with a hypothesis about their possible morphological class and stem. MorphoClass exploits global information (ending-guessing rules, maximum likelihood estimations, word frequency statistics), morphological ...

متن کامل

A Corpus-based Approach to the Interpretation of Unknown Words with an Application to German

Abstract Usually a high portion of the different word forms in a corpus receive no reading by the lexical and/or morphological analysis. These unknown words constitute a huge problem for NLP analysis tasks like POS-tagging or syntactic parsing. We present a parameterizable (in principle language-independent) corpus-based approach for the interpretation of unknown words that only needs a tokeniz...

متن کامل

Morphological features help POS tagging of unknown words across language varieties

Part-of-speech tagging, like any supervised statistical NLP task, is more difficult when test sets are very different from training sets, for example when tagging across genres or language varieties. We examined the problem of POS tagging of different varieties of Mandarin Chinese (PRC-Mainland, PRCHong Kong, and Taiwan). An analytic study first showed that unknown words were a major source of ...

متن کامل

Automatic Rule Induction for Unknown-Word Guessing

Words unknown to the lexicon present a substantial problem to NLP modules that rely on morphosyntactic information, such as part-of-speech taggers or syntactic parsers. In this paper we present a technique for fully automatic acquisition of rules that guess possible part-of-speech tags for unknown words using their starting and ending segments. The learning is performed from a general-purpose l...

متن کامل

Unsupervised Learning of Word-Category Guessing Rules

Words unknown to the lexicon present a substantial problem to part-of-speech tagging. In this paper we present a technique for fully unsupervised statistical acquisition of rules which guess possible partsof-speech for unknown words. Three complementary sets of word-guessing rules are induced from the lexicon and a raw corpus: prefix morphological rules, suffix morphological rules and ending-gu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003